Alessandro Pomponio

Matricola 0000920265

  1. a pairplot of the data (see Seaborn pairplot) and a comment on remarkable situations, if any (5pt)
  2. a classification model using a method of your choice with the schema "train-validationtest" exploring an appropriate range of parameter values (5pt)
  3. the optimal parameter(s) (5pt)
  4. a scatter plot of the test set using a pair of attributes of your choice with the class as colour (5pt)
  5. ... and the good/bad prediction as the point style (5pt)

1. A pairplot of the data (see Seaborn pairplot) and a comment on remarkable situations, if any (5pt)

The pairplots don't seem to show any particular pattern in the data.

2. A classification model using a method of your choice with the schema "train-validation-test" exploring an appropriate range of parameter values (5pt)

For this classification assignment, we will use a Decision Tree.

We start by dividing the features matrix and the target column, as follows:

We now divide the data in training and test set by means of the train_test_split function

We now instantiate a Decision Tree and fit it on the training data.

We then use it to predict the training values, using the accuracy_score function to see the accuracy on the training set

To have a more meaningful result, we will try it on the test set as well, to obtain a "baseline" value for the performance of our classifier

Since our assignment is to use the train-validation-test schema, we will split once more the test data into a test and validation set

Now we can save the depth of the tree with default hyperparameters. This way, we can vary the depths in order to see what is the best fit for our data.

3. The optimal parameter(s) (5pt)

We now have a look at the accuracy scores that we obtained in the previous step

The best hyperparameter configuration is the one that maximises the accuracy

We now test our tuned hyperparameter on the old training data and compute its accuracy

The results show that we obtained the same accuracy but with a smaller tree

4. A scatter plot of the test set using a pair of attributes of your choice with the class as colour (5pt)

Let us choose C01 and C02 as the attribute pair we will consider

We notice that the items of class c tend to be in the lower part of the plot

5 ... and the good/bad prediction as the point style (5pt)